A study of polysemy judgements and inter-annotator agreement
نویسنده
چکیده
This paper describes two experiments on polysemy judgement and sense annotation. The first experiment enabled us to select the most polysemous words which were used in the second experiment, and which serve as test words for the evaluation of WSD systems. We show that this selection method yields results different from selecting words on the basis of their number of senses in a dictionary, and is more appropriate in the context. Both experiments show considerable disagreement among the human judges. Disagreement on sense annotation is particularly concerning, since it sheds some doubt on the very possibility of evaluating WSD systems. However, we show that a lot of the disagreement is due to the too fine granularity of sense definitions in dictionaries. We also show that clustering techniques can enable us to re-code the individual annotations according to a small set of "super-tags", and obtain a quite satisfactory level of inter-annotator agreement. Surprisingly enough, the natural clustering provided by lexicographers in the hierarchy of dictionary entries does not provide a substantial disagreement reduction, whereas "blind" data-oriented techniques produce satisfactory results. Beyond its practical goals, this paper therefore probably raises some more general questions about lexicographic practice and the adequacy of dictionaries to NLP tasks.
منابع مشابه
Commitments to Preferences in Dialogue
We propose a method for modelling how dialogue moves influence and are influenced by the agents’ preferences. We extract constraints on preferences and dependencies among them, even when they are expressed indirectly, by exploiting discourse structure. Our method relies on a study of 20 dialogues chosen at random from the Verbmobil corpus. We then test the algorithms predictions against the jud...
متن کاملHuman Judgements on Causation in French Texts
The annotation of causal relations in natural language texts can lead to a low inter-annotator agreement. A French corpus annotated with causal relations would be helpful for the evaluation of programs that extract causal knowledge, as well as for the study of the expression of causation. As previous theoretical work provides no necessary and sufficient condition that would allow an annotator t...
متن کاملAn Agreement Measure for Determining Inter-Annotator Reliability of Human Judgements on Affective Text
An affective text may be judged to belong to multiple affect categories as it may evoke different affects with varying degree of intensity. For affect classification of text, it is often required to annotate text corpus with affect categories. This task is often performed by a number of human judges. This paper presents a new agreement measure inspired by Kappa coefficient to compute inter-anno...
متن کاملCro36WSD: A Lexical Sample for Croatian Word Sense Disambiguation
We introduce Cro36WSD, a freely-available medium-sized lexical sample for Croatian word sense disambiguation (WSD). Cro36WSD comprises 36 words: 12 adjectives, 12 nouns, and 12 verbs, balanced across both frequency bands and polysemy levels. We adopt the multi-label annotation scheme in the hope of lessening the drawbacks of discrete sense inventories and obtaining more realistic annotations fr...
متن کاملTree-Cut and a Lexicon Based on Systematic Polysemy
This paper describes a lexicon organized around systematic polysemy: a set of word senses that are related in systematic and predictable ways. The lexicon is derived by a fully automatic extraction method which utilizes a clustering technique called tree-cut. We compare our lexicon to WordNet cousins, and the inter-annotator disagreement observed between WordNet Semcor and DSO corpora.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1998